savingSessions: tweet analysis

Author

Ben Anderson (@dataknut)

Published

December 13, 2022

1 Background

UK demand response experiments by NG-ESO and retailers such as @OctopusEnergy

Attempt to do some analysis of #savingSession(s) tweets.

Inspired by https://docs.ropensci.org/rtweet/

Last run at: 2022-12-13 17:47:55

2 Setup

Part of https://github.com/dataknut/savingSessions

Makes use of https://github.com/dataknut/hashTagR, a DIY wrapper for the rtweet rstats package.

data.table   hashTagR    ggplot2  lubridate      readr     rtweet  tidyverse 
      TRUE       TRUE       TRUE       TRUE       TRUE       TRUE       TRUE 
  tidytext  wordcloud 
      TRUE       TRUE 

Grab the most recent set of tweets that mention #savingSession OR #savingSessions OR #savingsession using the rtweet::search_tweet() function and merge with any we may already have downloaded.

Should we also try to get all replies to @savingSessions?

Note that tweets do not seem to be available after ~ 14 days via the API used by rtweet. Best to keep refreshing the data every week…

[1] "Found 71 files matching *.csv in ~/Dropbox/data/twitter/savingSessions/"

That produced a data file of 3323 tweets.

We do NOT store the tweets in the repo for both ethical and practical reasons…

Note also that we may not be collecting the complete dataset of hashtagged tweets due to the intricacies of the twitter API.

3 Analysis

Figure 1 shows the timing of tweets by hour.

Figure 1: Tweets over time

Figure 2 shows cumulative tweets by hour.

Figure 2: Cumulative number of tweets over time

We see roughly the kind of uptick in tweets for Session 2 that we saw for Session 1…

Let’s try a word cloud.

Inspiration here: https://towardsdatascience.com/create-a-word-cloud-with-r-bde3e7422e8a

Make a word cloud for all tweets

These may not render the word ‘savingsession’ as it will be in all tweets due to the twitter search pattern used.

We need to remove common words (to, the, and, a, for, etc). These are called ‘stop words’.

What happens if we do that?

Not especially informative… Perhaps we should try to extract the ‘sentiment’ of the words.

Inspired by https://www.tidytextmining.com/sentiment.html

Take those cleaned words and sentiment them!

In each case we show the number of negative and positive codings for the unique words (which will add up to the number of unique words) and then the total frequency of words that are negative or positive (which will add up to the total number of words).

Got it?

The first word cloud shows names that have negative sentiment (according to tidytext::get_sentiments("bing")). Remember the size of the words is relative to the count of all negative words.

[1] 525

negative positive 
     288      237 
[1] 2990
# A tibble: 2 × 2
  sentiment  freq
  <chr>     <int>
1 negative    971
2 positive   2019

The second wordcloud shows words with positive sentiments. Remember the size of the words is relative to the count of all positive words.

Repeat these negative/postive word clouds but just for the first session which was on 2022-11-15.

These are just the tweets for the day of the event and the day after…

Guess which cloud is which?

NULL
NULL

Repeat for session 2 which was on 2022-11-22.

These are just the tweets for the day of the event and the day after…

NULL
NULL

Repeat for session 3 which which was on 2022-11-30.

These are just the tweets for the day of the event and the day after…

NULL
NULL

Repeat for session 4 which was on 2022-12-01.

These are just the tweets for the day of the event and the day after…

NULL
NULL

Repeat for session 5 which was on 2022-12-12.

These are just the tweets for the day of the event and the day after…

NULL
NULL